-
Notifications
You must be signed in to change notification settings - Fork 576
[CI] Enable linux-aarch64-a2 (64GB) and tp2 * 2 max-parallel to speed up CI #2065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2065 +/- ##
==========================================
- Coverage 73.85% 73.83% -0.03%
==========================================
Files 103 96 -7
Lines 11425 10865 -560
==========================================
- Hits 8438 8022 -416
+ Misses 2987 2843 -144
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: wangli <wangli858794774@gmail.com>
|
cc @wangxiyuan @ganyi1996ppo @jianzs @ApsarasX @zzzzwwjj @yiz-liu @whx-sjtu @Angazenn @mengwei805 FYI, after this PR we will use A2 (64GB) in CI |
|
https://github.com/vllm-project/vllm-ascend/blob/main/benchmarks/scripts/run_accuracy.py Seems also need change? And many func should be added on yaml, this can be done in new PR Should be included |
… up CI (vllm-project#2065) ### What this PR does / why we need it? Currently our workflow run time takes about 3 hours in total, which seriously affects the developer experience, so it is urgent to have a optimization, after this pr, It is expected that the running time of the full CI can be shortened to 1h40min. - Enable linux-aarch64-a2 (64GB) to replace linux-arm64-npu (32GB) - Change TP4 ---> TP2 * 2 max-parallel - Move DeepSeek-V2-Lite-W8A8 to single card test ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@a248025 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
… up CI (vllm-project#2065) ### What this PR does / why we need it? Currently our workflow run time takes about 3 hours in total, which seriously affects the developer experience, so it is urgent to have a optimization, after this pr, It is expected that the running time of the full CI can be shortened to 1h40min. - Enable linux-aarch64-a2 (64GB) to replace linux-arm64-npu (32GB) - Change TP4 ---> TP2 * 2 max-parallel - Move DeepSeek-V2-Lite-W8A8 to single card test ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@a248025 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
### What this PR does / why we need it? Switch Infra to linux-aarch64-a2 and python to 3.11 Soft backport: #2065 Soft backport: #2072 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed search all: `linux-arm64-npu` and `3.10` --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
… up CI (vllm-project#2065) ### What this PR does / why we need it? Currently our workflow run time takes about 3 hours in total, which seriously affects the developer experience, so it is urgent to have a optimization, after this pr, It is expected that the running time of the full CI can be shortened to 1h40min. - Enable linux-aarch64-a2 (64GB) to replace linux-arm64-npu (32GB) - Change TP4 ---> TP2 * 2 max-parallel - Move DeepSeek-V2-Lite-W8A8 to single card test ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@a248025 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
… up CI (vllm-project#2065) ### What this PR does / why we need it? Currently our workflow run time takes about 3 hours in total, which seriously affects the developer experience, so it is urgent to have a optimization, after this pr, It is expected that the running time of the full CI can be shortened to 1h40min. - Enable linux-aarch64-a2 (64GB) to replace linux-arm64-npu (32GB) - Change TP4 ---> TP2 * 2 max-parallel - Move DeepSeek-V2-Lite-W8A8 to single card test ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@a248025 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
What this PR does / why we need it?
Currently our workflow run time takes about 3 hours in total, which seriously affects the developer experience, so it is urgent to have a optimization, after this pr, It is expected that the running time of the full CI can be shortened to 1h40min.
Does this PR introduce any user-facing change?
No
How was this patch tested?